Search Result

Select

Network representation learning algorithm incorporated with node profile attribute information

LIU Zhengming, MA Hong, LIU Shuxin, LI Haitao, CHANG Sheng

Journal of Computer Applications 2019, 39 (4): 1012-1020. DOI: 10.11772/j.issn.1001-9081.2018081851

Abstract （592）

PDF （1354KB）（369）

Save

In order to enhance the network representation learning quality with node profile information, and focus on the problems of semantic information dispersion and incompleteness of node profile attribute information in social network, a network representation learning algorithm incorporated with node profile information was proposed, namely NPA-NRL. Firstly, attribute information were encoded by one-hot encoding, and a data augmentation method of random perturbation was introduced to overcome the incompleteness of node profile attribute information. Then, attribute coding and structure coding were combined as the input of deep neural network to realize mutual complementation of the two types of information. Finally, an attribute similarity measure function based on network homogeneity and a structural similarity measure function based on SkipGram model were designed to mine fused semantic information through joint training. The experimental results on three real network datasets including GPLUS, OKLAHOMA and UNC demonstrate that, compared with the classic DeepWalk, Text-Associated DeepWalk (TADW), User Profile Preserving Social Network Embedding (UPP-SNE) and Social Network Embedding (SNE) algorithms, the proposed NPA-NRL algorithm has a 2.75% improvement in average Area Under Curve of ROC (AUC) value on link prediction task, and a 7.10% improvement in average F1 value on node classification task.

Reference | Related Articles | Metrics

Select

Real-time landmark matching algorithm supported by improved FAST feature point

YANG Qili, ZHU Lanyan, LI Haitao

Journal of Computer Applications 2016, 36 (5): 1404-1409. DOI: 10.11772/j.issn.1001-9081.2016.05.1404

Abstract （373）

PDF （1097KB）（354）

Save

Concerning the problem that matching time and accuracy requirements can not be met the simultaneously in image matching technology, a method based on feature points matching was proposed. Landmark matching was achieved successfully by using Random Forest (RF), and matching problem was translated into simple classifying problem to reduce the complication of computation for real-time image matching. Landmark image was represented by Features from Accelerated Segment Test (FAST) feature points, the scale and affine invariability of FAST feature points were improved by Gaussian pyramid structure and affine augmented strategy, and the matching rate was raised. Comparing with Scale-Invariant Feature Transform (SIFT) algorithm and Speed Up Robust Feature (SURF) algorithm, the experimental results show that the matching rate of the proposed algrorithm reached about 90%, keeping the matching rate approximately with SIFT and SURF in cases of scale change, occlusion or rotation, and its running time was an order of magnitude than other two algorithms. This method matches landmarks efficiently and its running time meets the real-time requirements.

Reference | Related Articles | Metrics

Select

Implementation of decision tree algorithm dealing with massive noisy data based on Hadoop

LIU Yaqiu, LI Haitao, JING Weipeng

Journal of Computer Applications 2015, 35 (4): 1143-1147. DOI: 10.11772/j.issn.1001-9081.2015.04.1143

Abstract （586）

PDF （750KB）（588）

Save

Concerning that current decision tree algorithms seldom consider the influence of the level of noise in the training set on the model, and traditional algorithms of resident memory have difficulty in processing massive data, an Imprecise Probability C4.5 algorithm named IP-C4.5 was proposed based on Hadoop. When training model, IP-C4.5 algorithm considered that the training set used to design decision trees is not reliable, and used imprecise probability information gain rate as selecting split criterion to reduce the influence of the noisy data on the model. To enhance the ability of dealing with massive data, IP-C4.5 was implemented on Hadoop by MapReduce programming based on file split. The experimental results show that when the training set is noisy, the accuracy of IP-C4.5 algorithm is higher than that of C4.5 and Complete CDT (CCDT), especially when the data noise degree is more than 10%, it has outstanding performance; and IP-C4.5 algorithm with parallelization based on Hadoop has the ability of dealing with massive data.

Reference | Related Articles | Metrics